A Comparative Study of Hard and Fuzzy Data Clustering Algorithms with Cluster Validity Indices

نویسندگان

O. A. Mohamed Jafar

R. Sivakumar

چکیده

Data clustering is one of the important data mining methods. It is a process of finding classes of a data set with most similarity in the same class and most dissimilarity between different classes. The well known hard clustering algorithm (K -means) and Fuzzy clustering algorithm (FCM) are mostly based on Euclidean distance measure. In this paper, a comparative study of these algorithms with different distance measures such as Chebyshev and Chi-square is proposed. The new algorithms are tested on the four well known data sets such as Contraceptive Method Choice (CMC), Diabetes, Liver Disorders and Statlog (Heart) from the UCI repository. Experimental results show that FCM based on Chi-square distance measure gives better result than Chebyshev distance measure. We also propose the FCM algorithm based on σ -distance measure. The FCM algorithm is also tested with cluster validity indices such as partition coefficient and partition entropy. The results show that Chebyshev distance measure is reported maximum partition coefficient and minimum partition entropy than the other distance measures. This paper also provides a brief review of applications of K -means and Fuzzy c-means algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...

متن کامل

Clustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers

In current study, a particle swarm clustering method is suggested for clustering triangular fuzzy data. This clustering method can find fuzzy cluster centers in the proposed method, where fuzzy cluster centers contain more points from the corresponding cluster, the higher clustering accuracy. Also, triangular fuzzy numbers are utilized to demonstrate uncertain data. To compare triangular fuzzy ...

متن کامل

EXCLUVIS: A MATLAB GUI Software for Comparative Study of Clustering and Visualization of Gene Expression Data

The result of one clustering algorithm varies from that of another for the same input dataset as the input parameters of an algorithms can substantially affect the behavior and execution of the algorithms. Cluster validity measures can be used to find the partitioning that best fits the underlying data. In most realistic applications, this analysis can be visualized using simple Computer-Aided-...

متن کامل

A New Validity Measure for Heuristic Possibilistic Clustering

A heuristic approach to possibilistic clustering is the effective tool for the data analysis. The approach is based on the concept of allotment among fuzzy clusters. To establish the number of clusters in a data set, a validity measure is proposed in this paper. An illustrative example of application of the proposed validity measure to the Anderson’s Iris data is given. A comparison of the vali...

متن کامل

Fuzzy Cluster Quality Index using Decision Theory

Abstract Clustering can be defined as the process of grouping physical or abstract objects into classes of similar objects. It’s an unsupervised learning problem of organizing unlabeled objects into natural groups in such a way objects in the same group is more similar than objects in the different groups. Conventional clustering algorithms cannot handle uncertainty that exists in the real life...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

A Comparative Study of Hard and Fuzzy Data Clustering Algorithms with Cluster Validity Indices

نویسندگان

چکیده

منابع مشابه

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Clustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers

EXCLUVIS: A MATLAB GUI Software for Comparative Study of Clustering and Visualization of Gene Expression Data

A New Validity Measure for Heuristic Possibilistic Clustering

Fuzzy Cluster Quality Index using Decision Theory

عنوان ژورنال:

اشتراک گذاری